Towards Computation, Space, and Data Efficiency in de novo DNA Assembly: A Novel Algorithmic Framework
نویسندگان
چکیده
We consider the problem of de novo DNA sequencing from shot gun data, wherein an underlying (unknown) DNA sequence is to be reconstructed from several short substrings of the sequence. We propose a de novo assembly algorithm which requires only the minimum amount of data and is efficient with respect to space and computation. We design the algorithm from an information theoretic perspective of using minimum amount of data. The key idea to achieve space and computational efficiency is to break the procedure into two phases, an online and an offline phase. We remark that this can serve as an evidence of the feasibility of using an information-theoretic perspective to guide practical algorithmic design in DNA sequencing. Preliminary work on extending this algorithmic framework to more realistic settings is also reported.
منابع مشابه
Clustering of Short Read Sequences for de novo Transcriptome Assembly
Given the importance of transcriptome analysis in various biological studies and considering thevast amount of whole transcriptome sequencing data, it seems necessary to develop analgorithm to assemble transcriptome data. In this study we propose an algorithm fortranscriptome assembly in the absence of a reference genome. First, the contiguous sequencesare generated using de Bruijn graph with d...
متن کاملEvolutionary algorithms and de novo peptide design
Automated de novo design of bioactive molecules is one of the aspired goals in computational chemistry. Despite significant progresses in computational approaches for ligand design and efficient evaluation of binding energy, novel procedures for ligand design are required. Evolutionary computation provides a new approach to this design issue. This paper proposes a framework for evolving ligands...
متن کاملFinding Exact and Solo LTR-Retrotransposons in Biological Sequences Using SVM
Finding repetitive subsequences in genome is a challengeable problem in bioinformatics research area. A lot of approaches have been proposed to solve the problem, which could be divided to library base and de novo methods. The library base methods use predetermined repetitive genome’s subsequences, where library-less methods attempt to discover repetitive subsequences by analytical approach...
متن کاملDe Novo Ultrascale Atomistic Simulations On High-End Parallel Supercomputers
We present a de novo hierarchical simulation framework for first-principles based predictive simulations of materials and their validation on high-end parallel supercomputers and geographically distributed clusters. In this framework, highend chemically reactive and non-reactive molecular dynamics (MD) simulations explore a wide solution space to discover microscopic mechanisms that govern macr...
متن کاملP-70: Evidence for Differential Gene Expression of A Major EpigeneticModifier Enzyme, de novo DNA Methyltransferase 3b, through Vitrification of Mouse Ovary Tissue
Background: Ovarian tissue cryopreservation is a feasible method to preserve female reproductive potential, especially in young patients with cancer or in women at risk of premature ovarian failure. Vitrification has recently emerged as a new trend for biological specimen preservation. On the other hand, gene expression that changes during vitrification can influence oocyte maturation and need ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013